Biology Methods and Protocols
◐ Oxford University Press (OUP)
Preprints posted in the last 90 days, ranked by how well they match Biology Methods and Protocols's content profile, based on 53 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Adegbosin, O. T.; Patel, H.
Show abstract
BackgroundMicrosatellite stability status determination is important for prognostication and therapeutic decision making in colorectal cancer management, but the conventional methods for this assessment are not readily available, especially in low- and middle-income countries. Deep learning (DL) models have been proposed for addressing this problem; however, potential computational cost due to model complexity and inadequate explainability may limit their adoption in low-resource settings. This study explored the potential of explainable lightweight models for detection of microsatellite instability in colorectal cancer. MethodsDL models were trained using a public dataset of colorectal cancer histology images and then used to classify a set of test images into one of two classes: microsatellite instability or microsatellite stability. The models were compared for efficiency. Gradient-weighted class activation mapping (Grad-CAM) was used to interpret the models decision making. ResultsThe simpler convolutional neural network (CNN) trained from scratch had modest performance (accuracy=0.757, area under receiver-operating characteristic curve [AUROC]=0.840). With an attention mechanism added, these values increased, but specificity and sensitivity reduced. Pretrained models performed better than the ones trained from scratch, and EfficientNet_B0 had the best balance of high performance and low computational requirements (accuracy=0.936, AUROC=0.990, negative predictive value=0.923, specificity=0.953, 4,010,000 trainable parameters, 0.38 gigaFLOPs). However, a simple CNN model with attention mechanism had the best interpretability based on Grad-CAM. ConclusionThis study demonstrated that DL models that are lightweight when compared to previously proposed ones can be useful for colorectal cancer microsatellite instability screening in resource-limited settings while balancing performance and computational efficiency.
Bou Malham, V.; Leandre, F.; Hamimi, A.; Lagoutte, I.; Bouchet, S.; Gougelet, A.; Colnot, S.; Desbois-Mouthon, C.
Show abstract
Background & aimsConstitutive activation of the {beta}-catenin pathway is a determining feature in the pathogenesis of two primary liver cancers, namely HCC and hepatoblastoma (HB). Activating alterations in CTNNB1 gene and, to a lesser extent, inhibiting alterations in APC gene are observed in 30 to 40% of HCC cases and 80 to 90% of HB cases. For both tumours, therapeutic management is far from optimal. Therefore, relevant experimental models are needed to increase our knowledge and test new therapeutic approaches. MethodsOrganoids and tumouroids were established from APC{Delta}hep and {beta}cat{Delta}ex3 mouse models, which are clinically relevant models for {beta}-catenin-activated HCC and mesenchymal HB. We developed a new methodological approach based on a dynamic suspension culture in a rotating bioreactor. Morphological and molecular characteristics and sensitivity to WNTinib, a treatment already successfully tested on human HCC and HB tumouroids, were evaluated by histology, immunohistochemistry, immunofluorescence, and RT-qPCR. ResultsThis easy-to-implement methodology allows for the rapid generation of a large number of organoids and tumouroids that are uniform in size and show no signs of cell death in their core. The robustness of the methodology is illustrated by the maintenance of the histological architecture, cell diversity and gene expression in organoids and tumouroids in comparison with the native liver tissues. In addition, the value of the HCC-derived tumouroids for evaluating cancer treatment was assessed based on their responsiveness to the {beta}-catenin antagonist WNTinib. ConclusionsThe organoids and tumouroids that we present here are new reliable in vitro cancer models, recapitulating the main features of {beta}-catenin-driven HCC and mesenchymal HB. They can be integrated into an appropriate platform for drug screening and could enable the development of "a la carte" therapies that are urgently needed for these indications. Impact and implicationsThis study addresses the critical need for representative in vitro models to investigate {beta}-catenin-driven liver cancers. The organoids and tumouroids developed here are particularly valuable for researchers seeking robust, reproducible models that accurately reflect the cellular diversity and gene expression profiles of native liver tumours. These findings have practical applications in exploring cancer mechanisms, screening new drugs, optimizing personalized treatment strategies, and reducing reliance on animal models, which ultimately benefits patients. HighlightsO_LIEasy and rapid generation of mouse liver organoids and tumouroids from {beta}-catenin activated tumours using culture in a bioreactor C_LIO_LITumouroids preserve histology, cell diversity, and gene expression of native tissue C_LIO_LIHCC-derived tumouroids respond to {beta}-catenin inhibitor WNTinib C_LIO_LIThese reliable 3D models reduce reliance on animal experiments for drug testing C_LI
Bolut, C.; Pacary, A.; Pieruccioni, L.; Ousset, M.; Paupert, J.; Casteilla, L.; Simoncini, D.
Show abstract
Machine learning (ML) models are effective at classifying images across various fields, including biology. However, their performance on biomedical images is often limited by the small size of available datasets that are constrained by the time-consuming and costly nature of experimental data collection. A review of the literature shows that many studies using biomedical images fail to follow ML best practices. This study focuses on regenerative medicine, which aims to promote tissue regeneration rather than scarring. To explore this process, we applied ML to a limited dataset of images of mice tissues, aiming to distinguish between regenerating and scarring samples. As expected binary classification failed to generalize to independent data. A novel SHAP-based analysis revealed that the overfitting models were based on spurious correlations including individual mice characteristics that aligned with the regeneration/scarring labels. The models appeared to be solving the binary classification task, but were in fact recognizing individuals. To investigate this behavior further, we examined the test set confusion matrix of a model trained to identify individual mice. We observed that, beyond individual recognition, individuals were grouped according to the time elapsed after injury (day 3 or 10) and the healing outcome (regeneration or scarring). We hypothesized that these groupings were based on relevant biological information captured by the model. To test this hypothesis, we successfully trained a model to classify images according to the time elapsed after injury (3 or 10 days), demonstrating that ML can extract relevant biological information when the task is aligned with what the data can actually support. Altogether, this study demonstrates that carefully examining explanations of a model is not only an effective way to unveil putative biases but also to extract relevant information from a limited dataset. Author summaryMachine learning is increasingly used to analyze biomedical images, but in many experimental settings only small datasets are available, which can easily mislead powerful models. In this study, we looked at images from mice tissues, with the goal to distinguish healing by regeneration from healing by scarring. Although standard machine learning models appeared to perform well during training, they failed to generalize to new animals. By carefully analyzing model explanations, we found that the models were not learning biologically meaningful patterns of tissue repair, but instead were recognizing individual mice based on subtle image-specific signatures. Importantly, this same analysis revealed that the models did capture relevant biological information when the task was better aligned with the data, such as distinguishing early versus late stages of healing. Our results highlight how explanation methods can uncover hidden biases, prevent false conclusions, and help researchers extract meaningful biological insights even from limited and imperfect datasets.
Nieto Estrada, V. H.; Aya Porto, A. C.; Cardona Zorrilla, A. F.; Pulido Ramirez, E. O.; Trujillo Gordillo, H.; Sanchez Pineros, N. G.; wagner gutierrez, N.; Arrieta, O.; Molano, D. f.; Rolfo, C.; Nigita, G.; Nates, J.
Show abstract
BackgroundPrognostic assessment in critically ill patients with cancer remains challenging, as conventional ICU severity scores often perform suboptimally in this population. Machine learning (ML) approaches may improve outcome prediction by integrating acute physiology, organ dysfunction, and oncologic variables. We aimed to develop and validate ML-based models to predict ICU mortality and 30-day survival in critically ill cancer patients. MethodsWe conducted a retrospective cohort study including 997 critically ill cancer patients admitted to the ICU. Forty-eight demographic, oncologic, physiological, laboratory, and therapeutic variables collected at ICU admission were used to train and validate ML models. Eight algorithms were evaluated using stratified cross-validation with feature selection and hyperparameter optimization. Model performance was assessed using discrimination, calibration, and classification metrics. Model interpretability was explored using Shapley additive explanations (SHAP). ResultsCatBoost achieved the best performance for ICU mortality prediction (AUROC 0.96), showing excellent discrimination and calibration, and outperforming other ML models. Prediction of 30-day survival was less accurate (best AUROC 0.75), reflecting the influence of post-ICU factors not captured at admission. Key predictors of ICU mortality included severity of organ dysfunction, therapeutic objectives, vasopressor and methylene blue use, SAPS III score, lactate, platelet count, and blood urea nitrogen. For 30-day survival, baseline physiological status, admission type, SAPS III, lactate, creatinine, age, and body mass index were most relevant. SHAP analysis demonstrated that acute physiology and organ dysfunction, rather than cancer diagnosis alone, primarily drove short-term outcomes. ConclusionsML-based models, particularly CatBoost, outperformed traditional prognostic tools for predicting ICU mortality in critically ill cancer patients. Cancer was not an independent predictor of short-term mortality; outcomes were primarily driven by pre-ICU conditions, acute physiology, and severity of organ dysfunction. External validation is needed to confirm generalizability and support future integration of ML-based prediction tools into clinical decision-making in oncologic critical care.
Srinivasan, A.; Sritharan, D. V.; Chadha, S.; Fu, D.; Hossain, J. O.; Breuer, G. A.; Aneja, S.
Show abstract
PurposeDeep learning models are increasingly being used in medical diagnostics, but their vulnerability to adversarial perturbations raises concerns about their reliability in clinical applications. Capsule networks (CapsNets) are a promising architecture for medical imaging tasks, given their ability to model spatial relationships and train with smaller amounts of data. Although previous studies have focused on adversarial training approaches to improve robustness, exploring alternative architectures is an underexplored direction for combating poor adversarial stability. Prior work has suggested that CapsNets may exhibit improved robustness to adversarial perturbations compared to convolutional neural networks (CNNs), but performance on adversarial images has not been studied systematically in clinical environments. We evaluated the robustness of CapsNets compared to CNNs and vision transformers (ViTs) across multiple medical image classification tasks. MethodsWe trained two CNNs (ResNet-18 and ResNet-50), one ViT (MedViT), and two CapsNets (DR-CapsNet and BP-CapsNet) on four distinct medical imaging datasets (PneumoniaMNIST, BreastMNIST, NoduleMNIST3D, and BloodMNIST) and one natural image dataset (MNIST). Models were evaluated on adversarial examples generated by projected gradient descent and fast gradient sign method across a range of perturbation bounds. Interpretability experiments, including latent space and Gradient-weighted Class Activation Mapping (Grad-CAM) analyses, were conducted to better understand model stability on adversarial inputs. ResultsCapsNets demonstrated superior robustness under adversarial perturbations compared to CNNs and ViTs across all medical imaging datasets and the natural image dataset. Latent space and Grad-CAM visualizations revealed that CapsNets maintained more consistent embedding representations and attention maps after adversarial perturbations compared to CNNs and ViTs, suggesting that advantages in CapsNet robustness are supported, at least in part, by more stable feature encodings. Bayes-Pearson routing further improved robustness over standard dynamic routing in CapsNets without compromising baseline performance, suggesting a potential architectural improvement. ConclusionCapsNets exhibit intrinsic advantages in adversarial robustness over CNN- and ViT-based models on medical imaging tasks, suggesting they are a reliable alternative for medical image classification. These findings support the use of CapsNets in clinical applications where model reliability is critical.
Sarwin, G.; Ricciuti, V.; Staartjes, V. E.; Carretta, A.; Daher, N.; Li, Z.; Regli, L.; Mazzatenta, D.; Zoli, M.; Seungjun, R.; Konukoglu, E.; Serra, C.
Show abstract
Background and Objectives: We report the first intraoperative deployment of a real-time machine vision system in neurosurgery, derived from our previous anatomical detection work, automatically identifying structures during endoscopic endonasal surgery. Existing systems demonstrate promising performance in offline anatomical recognition, yet so far none have been implemented during live operations. Methods: A real-time anatomy detection model was trained using the YOLOv8 architecture (Ultralytics). Following training completion in the PyTorch environment, the model was exported to ONNX format and further optimized using the NVIDIA TensorRT engine. Deployment was carried out using the NVIDIA Holoscan SDK, the system ran on an NVIDIA Clara AGX developer kit. We used the model for real-time recognition of intraoperative anatomical structures and compared it with the same video labelled manually as reference. Model performance was reported using the average precision at an intersection-over-union threshold of 0.5 (AP50). Furthermore, end-to-end delay from frame acquisition to the display of the annotated output was measured. Results: A mean AP50 of 0.56 was achieved. The model demonstrated reliable detection of the most relevant landmarks in the transsphenoidal corridor. The mean end-to-end latency of the model was 47.81 ms (median 46.57 ms). Conclusion: For the first time, we demonstrate that clinical-grade, real-time machine-vision assistance during neurosurgery is feasible and can provide continuous, automated anatomical guidance from the surgical field. This approach may enhance intraoperative orientation, reduce cognitive load, and offer a powerful tool for surgical training. These findings represent an initial step toward integrating real-time AI support into routine neurosurgical workflows.
Reinosa, R.
Show abstract
IntroductionThe precise determination of diagnostic cut-off points is essential for the development of multimarker panels in oncology. In previous work on pulmonary nodules, it was observed that the standard two-parameter logistic fit could be insufficient for biomarkers with asymmetric distributions. Furthermore, the calculation of empirical cut-off points based on graphical visualization presented limitations in precision and reproducibility. ObjectiveThis study presents a methodological advancement in the data analysis phase (Stage 1), introducing new Python algorithms for the direct analytical calculation of empirical intersections and robust mathematical modeling using Dual Annealing with both two-parameter and four-parameter logistic functions. This improved methodology feeds into the ThresholdXpert 1.0 software tool for combinatorial optimization of biomarker panels (Stage 2), and is applied here to the diagnostic challenge of hepatocellular carcinoma (HCC). MethodsThe methodology was first validated by re-analyzing a dataset of patients with pulmonary nodules (N=895). It was subsequently applied to an HCC dataset derived from the cohort of Jang et al. (208 HCC, 193 cirrhosis, 401 total), randomly divided into a training set (280) and an independent test set (121). Scripts were developed to compare the previous two-parameter logistic fit with the new two- and four-parameter logistic models. Finally, ThresholdXpert 1.0 was used for multimarker panel optimization. ResultsThe integration of empirical calculation, logistic modeling, and combinatorial optimization through ThresholdXpert 1.0 provides a robust and coherent framework for the development of multimarker diagnostic panels. The four-parameter logistic model provided additional validation without substantially modifying cut-off values for most biomarkers, confirming the stability of the approach while offering greater flexibility for complex distributions. When applied to hepatocellular carcinoma, the framework identified a molecular panel composed of AFP, PIVKA-II, OPN, and DKK-1 with sensitivity of 0.77 and specificity of 0.72, and an optimized panel incorporating inverse MELD that achieved the best overall balance (sensitivity 0.73, specificity 0.75) in independent external validation. These results demonstrate the potential of this approach as a generalizable tool for the optimized design of binary diagnostic systems in oncology. ConclusionThe integration of complementary mathematical modeling enhances the capability of ThresholdXpert 1.0 to identify robust diagnostic panels, as in some cases a single biomarker may outperform biomarker combinations, and vice versa. This approach enabled the integration of molecular biomarkers and clinical variables under a unified mathematical framework. Contactroberto117343@gmail.com
Giri, R.; Agrawal, R.; Lamichhane, S. R.; Barma, S.; Mahatara, R.
Show abstract
We are pleased to submit our Original article entitled "Assessing medication-related burden and medication adherence among older patients from Central Nepal: A machine learning approach" for consideration in your esteemed journal. In this paper, we assessed medication burden using validated Living with medicines Questionnaire (LMQ-3) and medication adherence using Adherence to Medication refills (ARMS) Scale. In this paper we analysed our result through machine learning approach in spite of traditional statistical approach to identify the complex factors influencing both. Six ML architectures (Ordinary Least Square, LightGBM, Random Forest, XGBoost, SVM, and Penalized linear regression) were employed to predict ARMS and LMQ scores using various socio-demographic, clinical and medication-related predictive features. Model explainability was provided through SHAP (Shapley Additive exPlanations). Our study identified the moderate medication burden with moderate non-adherence among older adults. Requiring assistance for medication and polypharmacy were the strongest drivers for the medication burden and non-adherence. The high predictive accuracy by ML suggests the appropriate clinical intervention like deprescribing to cope with the high prevalent medication burden and non-adherence among older adults in Nepal.
Melhuish, T. A.; Adair, S. J.; Pemberton, O. S.; Bauer, T. W.; Wotton, D.
Show abstract
Low take rates and inter-tumor variability in growth rates can limit the effectiveness of mouse xenograft models when comparing between groups. To address this problem we developed a simple method to compare multiple cell types within a single mixed xenograft. Individual cell lines or clones were transduced with a lentiviral vector that includes a unique PCR tag, allowing the use of qPCR to determine the proportion of each tagged cell type within a mixed xenograft tumor. We generated vectors with six distinct PCR tags, and two different selectable markers, and have optimized the approach for determining their relative proportions within a mix. An initial pre-amplification step is used to increase the amount of material for subsequent qPCR reactions. This also removes the bulk of the genomic DNA, increasing the specificity of the qPCR step. Samples are then used for qPCR with specific pairs of primers that distinguish between each of the individual PCR tags, and the relative proportion of each tag is determined relative to that in the starting mix. We have tested this approach for in vitro growth of mixed cell cultures and in an orthotopic cecal xenograft model using a human colon cancer cell line. Since each individual tumor is initiated with a mix of cells, multiple tumors within a single animal can be analyzed separately, and overall tumor size is not important. Similarly, multiple metastatic lesions from the same animal can be analyzed individually. Thus, each tumor provides a direct comparison between individually tagged cell lines or clones. This low throughput "bar-coding" approach is simple and cost effective and has the potential to reduce the number of animals needed for xenograft experiments.
Brulhart, D.; Magini, G.; Schafer, A.; Schwab, S.; Held, U.
Show abstract
Objectives: Clinical prediction models estimate the risk of a future outcome in patients. Such models are often externally validated using independent datasets; however, even when a model has been rigorously validated in a new setting and patient population, its performance across other clinical settings remains unclear. Therefore, we systematically evaluated model performance and clinical utility across diverse patient populations to quantify the limits of transportability. Methods: Using liver transplantation as an example, we used the UK donation-after-circulatory-death (DCD) risk score and descriptive statistics from Swiss DCD liver transplant populations to simulate realistic target populations with varying donor and recipient characteristics. The risk score's ability to predict one-year graft failure was evaluated using calibration intercept, calibration slope, area under the receiver operating characteristic (ROC) curve, and net benefit. Results: The UK DCD Risk Score's performance depended heavily on the simulated population characteristics. While the score performed adequately in settings similar to those where it was derived, it was not satisfactory in others. Discussion: The study showed, using a risk score in liver transplantation as an example, that the application of a prediction model can be limited in certain external populations when they differ, and that its transportability in new settings is not guaranteed. Conclusion: This study highlights the importance of external validation of clinical prediction models to determine transportability to various target populations. Their application requires careful consideration and potential model re-estimation.
Skobelev, K.; Fithian, E.; Baranovski, Y.; Cook, J.; Angara, S.; Otto, S.; Yi, Z.-F.; Zhu, J.; Donoho, D. A.; Han, X. Y.; Mainkar, N.; Masson-Forsythe, M.
Show abstract
Recent Artificial Intelligence (AI) models have matched or exceeded human experts in several benchmarks of biomedical task performance, but have lagged behind on surgical image-analysis benchmarks. Since surgery requires integrating disparate tasks --- including multimodal data integration, human interaction, and physical effects --- generally-capable AI models could be particularly attractive as a collaborative tool if performance could be improved. On the one hand, the canonical approach of scaling architecture size and training data is attractive, especially since there are millions of hours of surgical video data generated per year. On the other hand, preparing surgical data for AI training requires significantly higher levels of professional expertise, and training on that data requires expensive computational resources. These trade-offs paint an uncertain picture of whether and to-what-extent modern AI could aid surgical practice. In this paper, we explore this question through a case study of surgical tool detection using state-of-the-art AI methods available in 2026. We demonstrate that even with multi-billion parameter models and extensive training, current Vision Language Models fall short in the seemingly simple task of tool detection in neurosurgery. Additionally, we show scaling experiments indicating that increasing model size and training time only leads to diminishing improvements in relevant performance metrics. Thus, our experiments suggest that current models could still face significant obstacles in surgical use cases. Moreover, some obstacles cannot be simply ``scaled away'' with additional compute and persist across diverse model architectures, raising the question of whether data and label availability are the only limiting factors. We discuss the main contributors to these constraints and advance potential solutions.
Spyretos, C.; Tampu, I. E.; Lindblad, J.; Haj-Hosseini, N.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWThe classification of pediatric brain tumors is investigated using deep learning on hematoxylin and eosin (H&E) and antigen Ki-67 (Ki-67) whole slide images (WSIs) from the Childrens Brain Tumor Network (CBTN) dataset. A total of 1,662 unregistered WSIs (1,047 H&E and 615 Ki-67 images) were analyzed, including low-grade glioma/astrocytoma (grades 1, 2) (LGG), high-grade glioma/astrocytoma (grades 3, 4) (HGG), medulloblastoma (MB), ependymoma (EP) and ganglioglioma. The The aim of this study was to effectively classify pediatric brain tumors using H&E and Ki-67 WSIs individually, and to investigate whether early, intermediate, and late fusion could improve the predictive performance. From each WSI, 224x 224 pixel patches were extracted, and the instance (patch)-level features were obtained using the histology foundation model CONCHv1_5. The instances were aggregated using clustering-constrained attention multiple instance learning (CLAM) for patient-level classification. Model interpretability and explainability was assessed through attention heatmaps, cell density and Ki-67 labelling index (LI) maps. In the binary grade classification between LGG and HGG, the intermediate concatenation fusion achieved the best performance with a balanced accuracy of 0.88 {+/-} 0.05, (p < 0.005) compared to the single-stain models (H&E: 0.84 {+/-} 0.05, Ki-67: 0.86 {+/-} 0.05). For the 5-class tumor type classification, the one-hidden layer late fusion learning model achieved the highest balanced accuracy of 0.83 {+/-} 0.04 (p < 0.005), outperforming the single-stain models (H&E: 0.77 {+/-} 0.05, Ki-67: 0.74 {+/-} 0.05). Overall, most of the fusion approaches outperformed the single-stain models in both classification tasks (p < 0.005). The Ki-67 attention maps demonstrated moderate to strong Spearman correlation ({rho} = 0.576 - 0.823) with the cell density and Ki-67 LI maps, suggesting that these features are associated with the models predictions, although additional features may contribute. The results show that H&E and Ki-67 images provide complementary information, and most of the multi-stain fusion approaches using deep learning improve pediatric brain tumor diagnosis.
Wang, J.; Yang, Z.; Zhu, Z.; Zhu, X.; Huang, Z.; Wang, H.; Tian, L.; Cao, Y.; Qu, X.; Qi, X.; Wu, B.
Show abstract
Background: LLMs enable patient-facing conversational agents, creating a pathway toward digital twins that capture older adults' lived experiences and behavioral responses across time. A central barrier is personality drift---inconsistent trait expression across repeated interactions---which undermines reliability of generated trajectories and intervention-response simulation in geriatric care. Objective: To develop ELDER-SIM, a multi-role elderly-care conversational platform for building personality-stable digital twin agents, and to propose a psychometric validation framework for quantifying personality consistency in LLM-based agents. Methods: ELDER-SIM was implemented via n8n workflow orchestration with local LLM inference (Ollama/vLLM), integrating (1) Big Five (OCEAN) trait specifications, (2) a Cognitive Conceptualization Diagram (CCD) grounded in Beck's CBT framework, and (3) a MySQL-based long-term memory module. Ablation studies across four conditions---Baseline, +Memory, +CCD, and +LoRA (fine-tuned on 19,717 instruction pairs from CHARLS)---were evaluated via Cronbach's $\alpha$, ICC, and role discrimination accuracy. Results: Personality measurement reliability was acceptable to excellent across conditions (Cronbach's : 0.70-0.94), with consistently high test-retest stability (ICC: 0.85- 2 0.96). Role discrimination improved stepwise from 83.3% (Baseline) to 88.9% (+Memory), 94.4% (+CCD), and 97.2% (+LoRA). CCD produced the largest gain in internal consistency (mean 0.702[->]0.892), while LoRA achieved the highest overall internal consistency ( 0.940) and ICC (0.958). Conclusions: ELDER-SIM provides a psychometrically validated approach for constructing personality-consistent elderly digital twin agents. Structured cognitive modeling and domain adaptation reduce personality drift, supporting reliable longitudinal simulation for elderly mental health care and reproducible in silico evaluation before clinical deployment.
Pham, T. D.
Show abstract
ObjectiveThis study investigates whether incorporating physiological coupling concepts into neural network design can support stable and interpretable feature learning for histopathological image classification under limited data conditions. MethodsA physiologically inspired architecture, termed CardioPulmoNet, is introduced to model interacting feature streams analogous to pulmonary ventilation and cardiac perfusion. Local and global tissue features are integrated through bidirectional multi-head attention, while a homeostatic regularization term encourages balanced information exchange between streams. The model was evaluated on three histopathological datasets involving oral squamous cell carcinoma, oral submucous fibrosis, and heart failure. In addition to end-to-end training, learned representations were assessed using linear support vector machines to examine feature separability. ResultsCardioPulmoNet achieved performance comparable to several pretrained convolutional neural networks across the evaluated datasets. When combined with a linear classifier, improved classification performance and higher area under the receiver operating characteristic curve were observed, suggesting that the learned feature embeddings are well structured for downstream discrimination. ConclusionThese results indicate that physiologically motivated architectural constraints may contribute to stable and discriminative representation learning in computational pathology, particularly when training data are limited. The proposed framework provides a step toward integrating physiological modeling principles into medical image analysis and may support future development of transferable and interpretable learning systems for histopathological diagnosis.
Veeramani, S.; Yin, C.; Yu, N.; Coleman, K. L.; Smith, B. J.; Weiner, G. J.
Show abstract
BackgroundTherapeutic agents targeting the PD1-PDL1 interaction are of great clinical value, however accurately predicting which patients are most likely to benefit is challenging. Improved predictive biomarkers for anti-PD1 therapy are clearly needed. Quantifying PD1 saturation by PDL1 in tumor tissue has the potential to serve as such a biomarker. Here we report a novel bioassay called the PD1 Ligand Receptor Complex Aptamer (LIRECAP) assay and demonstrate it can be used to quantify the saturation of PD1 by PDL1 in formalin-fixed paraffin-embedded tumor biospecimens. ResultsThe PD1 LIRECAP assay was developed by identifying a pair of RNA aptamers. One aptamer preferentially binds to unoccupied PD1 (P aptamer) and the other to the PD1-PDL1 complex (C aptamer). P and C aptamers were added together to a formalin-fixed sample, and bound aptamer extracted. A 2-color qRT-PCR assay using a single set of primers was used to determine the ratio of the sample-bound C to P aptamers (C:P ratio) which reflected PD1 saturation by PDL1 in the sample. Quantification of PD1 saturation by PDL1 as determined by the PD1 LIRECAP assay correlated closely with PD1-mediated signaling and PD1-PDL1 proximity. Analysis of sarcoma FFPE biospecimens confirmed the assay is technically reproducible on clinical biospecimens. There were significant differences in PD1 saturation by PDL1 between patients as well as considerable intratumoral heterogeneity. ConclusionsThe PD1 LIRECAP assay is novel assay that can be used to quantify PD1 saturation by PDL1 in clinical biospecimens. The assay is technically feasible, reproducible, and has the potential to serve as a superior predictive biomarker for PD1/PDL1-based therapy. Similar assays based on this platform could be used in other systems and settings to quantify interaction between two molecules.
Li, L. Y.; Lebiecka-Johansen, B.; Byberg, S.; Thambawita, V.; Hulman, A.
Show abstract
Diabetic retinopathy (DR) is a leading cause of vision impairment, requiring accurate and scalable diagnostic tools. Foundation models are increasingly applied to clinical imaging, but concerns remain about their calibration. We evaluated DINOv3, RETFound, and VisionFM for DR classification using different transfer learning strategies in BRSET (n = 16,266) and mBRSET (n = 5,164). Models achieved high discrimination in binary classification (normal vs retinopathy) in BRSET (AUROC 0.90-0.98), with DINOv3 achieving the best under full fine-tuning (AUROC 0.98 [95% CI: 0.97-0.99]). External validation on mBRSET showed decreased performance for all models regardless of the fine-tuning strategy (AUROC 0.70-0.85), though fine-tuning improved performance. Foundation models achieved strong discrimination but poor calibration, generally overestimating DR risk. While the generalist model, DINOv3, benefited from deeper fine-tuning, miscalibration remained evident. These findings underscore the need to improve calibration and the comprehensive evaluation of foundation models, which are essential in clinical settings. Author summaryArtificial intelligence is increasingly being used to detect eye diseases such as diabetic retinopathy from retinal images. Recent advances have introduced "foundation models," which are trained on large datasets and can be adapted to new tasks. We aimed to evaluate how well these models perform in a clinical prediction context, with a focus not only on accuracy but also on how reliably they estimate disease risk. In this study, we compared different types of foundation models using two independent datasets from Brazil. We found that while these models were generally good at distinguishing between healthy and diseased eyes, their predicted risks were often poorly calibrated. In other words, the estimated probabilities did not consistently reflect the true likelihood of disease. We also examined whether adapting the models to the target population could improve performance. Although this approach led to improvements, calibration issues remained. However, post-training correction improved the agreement between predicted risks and observed outcomes. Our findings highlight an important gap between model performance and clinical usefulness. We suggest that improving the reliability of risk estimates is essential before such systems can be safely used in healthcare.
Protserov, S.; Repalo, A.; Mashouri, P.; Hunter, J.; Masino, C.; Madani, A.; Brudno, M.
Show abstract
Machine learning models have seen a lot of success in medical image segmentation domain. However, one of the challenges that they face are confounders or shortcuts: spurious correlations or biases in the training data that affect the resulting models. One example of such confounders for surgical machine learning is the setup of surgical equipment, including tools and lighting. Using the task of identification of safe and dangerous zones of dissection in laparoscopic cholecystectomy images and videos as a use-case, we inspect two equipment-induced biases: the presence of surgical tools in the field of view and the position of lighting. We propose methods for evaluating the severity of these biases and augmentation-based methods for mitigating them. We show that our tool bias mitigations improve the models' consistency under tool movements by 9 percentage points in the most inconsistent cases, and by 4 percentage points on average. Our lighting bias mitigations help reduce fraction of true dangerous zone pixels that may be predicted as safe under light changes from 5% to 1.5%, without compromising segmentation quality.
Brault-Boixader, N.; Roca-Ventura, A.; Delgado-Gallen, S.; Buloz-Osorio, E.; Perellon-Alfonso, R.; Hung Au, C.; Bartres-Faz, D.; Pascual-Leone, A.; Tormos Munoz, J. M.; Abellaneda-Perez, K.; Prehabilita Working Group,
Show abstract
Prehabilitation (PRH) is a preoperative process aimed at optimizing patients functional capacity to improve surgical outcomes and overall well-being. While its physical and cognitive benefits are increasingly documented, its emotional impact, particularly in neuro-oncology patients, remains less explored. This study assessed the psychological effects of a PRH program on 29 brain tumor patients. The primary outcome, emotional well-being, was measured using quality of life and emotional distress metrices. Secondary outcomes included perceived stress levels and control attitudes. Additionally, qualitative data from structured interviews provided further insights into the psychological effects of the intervention. The results indicated significant improvements in quality of life and reductions in emotional distress, particularly among women. While perceived stress levels remained stable, control attitudes showed an increase. Qualitative analysis further highlighted the positive changes in the control sense and identified additional factors, such as the importance of social support sources during the PRH process. Overall, these findings suggest that PRH interventions play a significant role in enhancing emotional well-being among neuro-oncological patients in the preoperative phase. These results underscore the importance of implementing comprehensive and personalized PRH approaches to optimize clinical status both before and after surgery, thereby promoting sustained psychological benefits in this population. This study is based on data collected at Institut Guttmann in Barcelona in the context of the Prehabilita project (ClinicalTrials.gov identifier: NCT05844605; registration date: 06/05/2023).
Upadhyaya, D. J.; Schabath, M. B.; Hoogland, A. I.; Brady-Nicholls, R.
Show abstract
PurposePatient-reported outcomes (PROs) provide a quantitative measure of a patients quality of life, directly from the patient without external influence or interpretation. Prior studies have demonstrated correlations between individual PROs and cancer treatment response. However, this area of research is still highly understudied, and patient data often goes ignored. Our previous work has shown how changes in insomnia can be used to make binary decisions about a patients future volume response. Here, we expand upon that work to determine precisely when treatment progression will occur, providing an opportunity for clinicians to intervene sooner. Experimental DesignThis study analyzed PROs and tumor volume data collected from 80 NSCLC patients undergoing immunotherapy to determine how PRO dynamics could inform when volumetric treatment progression would occur. We calibrated the tumor growth inhibition (TGI) model to patient-specific tumor volume dynamics for all volume measurements using a leave-one-out cross-validation approach. Growth parameters were divided based on progression status and sampled depending on changes in patient-reported insomnia. A cutoff analysis was performed to determine the optimal cutoff for distinguishing between responders and non-responders. Predictions were made for the Nth patient and categorized using the cutoff. ResultsThis study demonstrated that incorporating patient-specific changes in insomnia with a mathematical model of volume changes can predict patient response with a 72.2% true positive rate and 71.3% overall accuracy, on average 6-8 weeks sooner. ConclusionUsing this innovative framework, we can predict precisely when progression occurs, giving clinicians the opportunity to intervene beforehand.
Nguyen, D. H.
Show abstract
Numerous studies have shown that the morphological phenotype of a cell or organoid correlates with its susceptibility to anti-cancer agents. However, traditional methods of measuring phenotype rely on spatial metrics such as area, volume, perimeter, and signal intensity, which work but are limited. These approaches cannot measure many crucial features of spatial context, such as chirality, which is the property of having left- and right-handedness. Volume cannot register chirality because the left shoe and right shoe hold the harbor the same amount of volume. Though spatial context in the form of chirality, direction of gravity, and the axis of polarity are intuitive notions to humans, traditional metrics relied on by cell biologists, pathologists, radiologists, and machine learning scientists up to this point cannot register these fundamental notions. The Linearized Compressed Polar Coordinates (LCPC) Transform is a novel algorithm that can capture spatial context unlike any other metric. The LCPC Transform translates a two-dimensional (2D) contour into a discrete sinusoid wave via overlaying a grid system that tracks points of intersection between the contour and the grid lines. It turns the contour into a series of sequential pairs of discrete coordinates, with the independent coordinate (x-coordinate) being consecutive positions in 2D space. Each dependent coordinate (y-coordinate) consists of the distance, between an intersection of the contour and gridline, to the origin of the grid system. In the form of a discrete sinusoid wave, the Fast Fourier Transform is then applied to the data. In this way, the shape of cells in 2D and 3D cell culture, are represented systematically and multidimensionally, allowing for robust quantitative stratification that will reveal insights into treatment resistance. SUMMARYThis article explains how novel features of morphology in cells and organoids can be measured by the Linearized Compressed Polar Coordinates (LCPC) Transform, a spatial algorithm that measures what traditional metrics, such as area, volume, surface area, etc., cannot. Best practices for shape orientation and alignment are discussed.